{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "BipedalWalker_Example.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/AI4Finance-LLC/ElegantRL/blob/master/BipedalWalker_Example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c1gUG3OCJ5GS"
      },
      "source": [
        "# **BipedalWalker-v3 Example in ElegantRL**\r\n",
        "\r\n",
        "\r\n",
        "\r\n",
        "\r\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FGXyBBvL0dR2"
      },
      "source": [
        "# **Part 1: Testing Task Description**\r\n",
        "\r\n",
        "[BipedalWalker-v3](https://gym.openai.com/envs/BipedalWalker-v2/) is a classic task in robotics since it performs one of the most fundamental skills: moving. In this task, our goal is to make a 2D biped walker to walk through rough terrain. BipedalWalker is a difficult task in continuous action space, and there are only a few RL implementations can reach the target reward."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 354
        },
        "id": "-HUVckiDVPhN",
        "outputId": "ea2edb57-2066-4206-fbe0-fb20525efda8"
      },
      "source": [
        "from IPython.display import HTML\r\n",
        "HTML(f\"\"\"<video src={\"https://gym.openai.com/videos/2019-10-21--mqt8Qj1mwo/BipedalWalker-v2/original.mp4\"} width=500 controls/>\"\"\") # the random demonstration of the task from OpenAI Gym"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<video src=https://gym.openai.com/videos/2019-10-21--mqt8Qj1mwo/BipedalWalker-v2/original.mp4 width=500 controls/>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 1
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DbamGVHC3AeW"
      },
      "source": [
        "# **Part 2: Install ElegantRL**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "U35bhkUqOqbS",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "e9902d01-191a-4432-e7dc-ed6010c474ff"
      },
      "source": [
        "# install elegantrl library\n",
        "!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git"
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git\n",
            "  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-z6u_kgey\n",
            "  Running command git clone -q https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-z6u_kgey\n",
            "Requirement already satisfied: gym in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (0.17.3)\n",
            "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (3.2.2)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (1.19.5)\n",
            "Collecting pybullet\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/6b/b6/719c6e1741fe6126c99d9f3a96fbb9f024ec12a60e6718843f33c7cab1b0/pybullet-3.0.8-cp37-cp37m-manylinux1_x86_64.whl (76.6MB)\n",
            "\u001b[K     |████████████████████████████████| 76.6MB 130kB/s \n",
            "\u001b[?25hRequirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (1.7.1+cu101)\n",
            "Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (4.1.2.30)\n",
            "Collecting box2d-py\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/87/34/da5393985c3ff9a76351df6127c275dcb5749ae0abbe8d5210f06d97405d/box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448kB)\n",
            "\u001b[K     |████████████████████████████████| 450kB 45.1MB/s \n",
            "\u001b[?25hRequirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.1) (1.3.0)\n",
            "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.1) (1.4.1)\n",
            "Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.1) (1.5.0)\n",
            "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (2.8.1)\n",
            "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (2.4.7)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (0.10.0)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (1.3.1)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->elegantrl==0.3.1) (3.7.4.3)\n",
            "Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from pyglet<=1.5.0,>=1.4.0->gym->elegantrl==0.3.1) (0.16.0)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->elegantrl==0.3.1) (1.15.0)\n",
            "Building wheels for collected packages: elegantrl\n",
            "  Building wheel for elegantrl (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for elegantrl: filename=elegantrl-0.3.1-cp37-none-any.whl size=77266 sha256=af00644ee85949ff4ebd177ed62dac017c73c4d74955b072c31ddfd3aecaa363\n",
            "  Stored in directory: /tmp/pip-ephem-wheel-cache-xj8x_09r/wheels/d0/f4/2e/cec0c14b57c2094a2bcef3063f95d758ad1309a640ff100419\n",
            "Successfully built elegantrl\n",
            "Installing collected packages: pybullet, box2d-py, elegantrl\n",
            "Successfully installed box2d-py-2.3.8 elegantrl-0.3.1 pybullet-3.0.8\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UVdmpnK_3Zcn"
      },
      "source": [
        "# **Part 3: Import Packages**\r\n",
        "\r\n",
        "\r\n",
        "*   **elegantrl**\r\n",
        "*   **OpenAI Gym**: a toolkit for developing and comparing reinforcement learning algorithms.\r\n",
        "*   **PyBullet Gym**: an open-source implementation of the OpenAI Gym MuJoCo environments.\r\n",
        "\r\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1VM1xKujoz-6",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "00bf7120-5d8f-47f4-9fd9-b1de3c1fc075"
      },
      "source": [
        "from elegantrl.run import *\r\n",
        "from elegantrl.agent import AgentGaePPO\r\n",
        "from elegantrl.env import PreprocessEnv\r\n",
        "import gym\r\n",
        "gym.logger.set_level(40) # Block warning"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "___version___==2021-03-03-2120\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3n8zcgcn14uq"
      },
      "source": [
        "# **Part 4: Specify Agent and Environment**\r\n",
        "\r\n",
        "*   **args.agent**: firstly chooses one DRL algorithm to use, and the user is able to choose any agent from agent.py\r\n",
        "*   **args.env**: creates and preprocesses the environment, and the user can either customize own environment or preprocess environments from OpenAI Gym and PyBullet Gym from env.py.\r\n",
        "\r\n",
        "\r\n",
        "> Before finishing initialization of **args**, please see Arguments() in run.py for more details about adjustable hyper-parameters.\r\n",
        "\r\n",
        "\r\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E03f6cTeajK4",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3b4b67dd-cfe5-4ac2-e7eb-e09ec4157f8a"
      },
      "source": [
        "args = Arguments(if_on_policy=False)\r\n",
        "args.agent = AgentGaePPO()  # AgentSAC(), AgentTD3(), AgentDDPG()\r\n",
        "args.env = PreprocessEnv(env=gym.make('BipedalWalker-v3'))\r\n",
        "args.reward_scale = 2 ** -1  # RewardRange: -200 < -150 < 300 < 334\r\n",
        "args.gamma = 0.95\r\n",
        "args.rollout_num = 2 # the number of rollout workers (larger is not always faster)"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "| env_name: BipedalWalker-v3, action space: Continuous\n",
            "| state_dim: 24, action_dim: 4, action_max: 1.0, target_reward: 300\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z1j5kLHF2dhJ"
      },
      "source": [
        "# **Part 5: Train and Evaluate the Agent**\r\n",
        "\r\n",
        "> The training and evaluating processes are all finished inside function **train_and_evaluate__multiprocessing()**, and the only parameter for it is **args**. It includes the fundamental objects in DRL:\r\n",
        "\r\n",
        "*   agent,\r\n",
        "*   environment.\r\n",
        "\r\n",
        "> And it also includes the parameters for training-control:\r\n",
        "\r\n",
        "*   batch_size,\r\n",
        "*   target_step,\r\n",
        "*   reward_scale,\r\n",
        "*   gamma, etc.\r\n",
        "\r\n",
        "> The parameters for evaluation-control:\r\n",
        "\r\n",
        "*   break_step,\r\n",
        "*   random_seed, etc.\r\n",
        "\r\n",
        "\r\n",
        "\r\n",
        "\r\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KGOPSD6da23k",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "358ff860-d67b-4abb-cc6b-1feea70ce6f7"
      },
      "source": [
        "train_and_evaluate__multiprocessing(args) # the training process will terminate once it reaches the target reward."
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "| multiprocessing, act_workers: 2\n",
            "| multiprocessing, None:\n",
            "| GPU id: 0, cwd: ./AgentGaePPO/BipedalWalker-v3_0\n",
            "| Remove history\n",
            "ID      Step      MaxR |    avgR      stdR       objA      objC\n",
            "0   0.00e+00    -91.74 |\n",
            "0   7.90e+04    -91.74 |  -92.27      0.07      -0.52      0.15\n",
            "0   8.92e+04    -79.90 |\n",
            "0   1.20e+05    -79.90 |  -92.16      0.12      -0.52      0.08\n",
            "0   1.39e+05     -9.71 |\n",
            "0   1.49e+05     -9.71 |  -12.75      0.08      -0.53      0.06\n",
            "0   1.83e+05     -9.71 | -115.72      2.81      -0.55      0.08\n",
            "0   2.13e+05     -9.71 | -119.22      1.19      -0.56      0.06\n",
            "0   2.42e+05     -9.71 |  -17.80      0.36      -0.57      0.03\n",
            "0   2.71e+05     -9.71 |  -19.41      0.49      -0.59      0.03\n",
            "0   3.00e+05     -9.71 |  -13.50      1.32      -0.60      0.01\n",
            "0   3.29e+05     -9.71 |  -17.35      0.58      -0.62      0.06\n",
            "0   3.58e+05     -9.71 |  -19.97      0.82      -0.63      0.01\n",
            "0   3.87e+05     -9.71 |  -27.50      4.42      -0.65      0.00\n",
            "0   4.16e+05     -9.71 |  -28.36      1.89      -0.67      0.01\n",
            "0   4.44e+05     -9.71 |  -32.85      1.51      -0.68      0.00\n",
            "0   4.73e+05     -9.71 |  -30.42      0.83      -0.69      0.01\n",
            "0   5.02e+05     -9.71 |  -22.06      5.28      -0.70      0.02\n",
            "0   5.32e+05     -9.71 |  -30.20      1.61      -0.71      0.01\n",
            "0   5.61e+05     -9.71 |  -30.23      1.29      -0.72      0.05\n",
            "0   5.88e+05     -9.71 |  -32.29      0.23      -0.73      0.02\n",
            "0   6.16e+05     -9.71 |  -29.71      1.48      -0.74      0.01\n",
            "0   6.44e+05     -9.71 |  -32.22      2.68      -0.75      0.05\n",
            "0   6.73e+05     -9.71 |  -32.17      0.03      -0.75      0.01\n",
            "0   7.01e+05     -9.71 |  -27.75      3.48      -0.76      0.02\n",
            "0   7.25e+05     -9.71 |  -34.54      0.89      -0.77      0.02\n",
            "0   7.54e+05     -9.71 |  -30.72     11.08      -0.78      0.01\n",
            "0   7.66e+05     -5.90 |\n",
            "0   7.76e+05     -5.90 |  -16.09     25.72      -0.79      0.03\n",
            "0   7.85e+05     39.92 |\n",
            "0   7.97e+05     60.53 |\n",
            "0   8.00e+05     60.53 |   60.53     33.89      -0.80      0.01\n",
            "0   8.24e+05     60.53 |   18.41     52.13      -0.80      0.02\n",
            "0   8.27e+05     76.85 |\n",
            "0   8.31e+05     89.84 |\n",
            "0   8.49e+05     89.84 |  -39.92      0.02      -0.81      0.01\n",
            "0   8.57e+05     95.86 |\n",
            "0   8.69e+05     95.86 |   -2.44     54.69      -0.83      0.01\n",
            "0   8.75e+05    109.39 |\n",
            "0   8.92e+05    109.39 |  103.35      4.45      -0.83      0.01\n",
            "0   8.96e+05    111.21 |\n",
            "0   9.18e+05    111.21 |  -24.02     22.31      -0.84      0.01\n",
            "0   9.21e+05    113.18 |\n",
            "0   9.42e+05    113.18 |  113.02     35.95      -0.86      0.01\n",
            "0   9.42e+05    126.40 |\n",
            "0   9.65e+05    133.99 |\n",
            "0   9.67e+05    133.99 |  133.99      4.64      -0.87      0.01\n",
            "0   9.92e+05    133.99 |   47.88     81.87      -0.88      0.01\n",
            "0   1.02e+06    133.99 |   88.16     68.64      -0.89      0.01\n",
            "0   1.03e+06    140.25 |\n",
            "0   1.04e+06    140.25 |  -27.62      7.86      -0.90      0.01\n",
            "| print_norm: state_avg, state_fix_std\n",
            "| avg = np.array([-0.02011463,  0.02409141, -0.25429018,  0.1098539 ,  2.61902816,\n",
            "        0.03851667,  1.55685594,  0.06934673,  0.0143034 , -2.54944489,\n",
            "       -0.04185725,  1.48792373, -0.07415436,  0.71671522, -0.07187245,\n",
            "       -0.07346577, -0.07427693, -0.07391288, -0.07019069, -0.06310751,\n",
            "       -0.04757434, -0.022208  ,  0.19534317,  0.10130062])\n",
            "| std = np.array([0.17906068, 0.04960189, 0.08074795, 0.05929731, 0.24277669,\n",
            "       0.44746367, 0.39655857, 0.78507916, 0.32997057, 0.19228805,\n",
            "       0.47369757, 0.3228804 , 0.74221032, 0.3182355 , 0.03988492,\n",
            "       0.04021584, 0.04154164, 0.04406695, 0.04814238, 0.05448643,\n",
            "       0.06483812, 0.08356642, 0.08475735, 0.03242376])\n",
            "| SavedDir: ./AgentGaePPO/BipedalWalker-v3_0\n",
            "| UsedTime: 9964\n",
            "[W CudaIPCTypes.cpp:22]← Don't worry about this warning.\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JPXOxLSqh5cP"
      },
      "source": [
        "Understanding the above results::\r\n",
        "*   **Step**: the total training steps.\r\n",
        "*  **MaxR**: the maximum reward.\r\n",
        "*   **avgR**: the average of the rewards.\r\n",
        "*   **stdR**: the standard deviation of the rewards.\r\n",
        "*   **objA**: the objective function value of Actor Network (Policy Network).\r\n",
        "*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network)."
      ]
    }
  ]
}